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Abstract. Experimental design is crucial for inference where limitations in the data collection procedure are 
present due to cost or other restrictions. Optimal experimental designs determine parameters that 
in some appropriate sense make the data the most informative possible. In a Bayesian setting this 
is translated to updating to the best possible posterior. Information theoretic arguments have led 
to the formation of the expected information gain as a design criterion. This can be evaluated 
mainly by Monte Carlo sampling and maximized by using stochastic approximation methods, both 
known for being computationally expensive tasks. We propose a framework where a lower bound of 
the expected information gain is used as an alternative design criterion. In addition to alleviating 
the computational burden, this also addresses issues concerning estimation bias. The problem of 
permeability inference in a large contaminated area is used to demonstrate the validity of our ap¬ 
proach where we employ the massively parallel version of the multiphase multicomponent simulator 
TOUGH2 to simulate contaminant transport and a Polynomial Chaos approximation of the forward 
model that further accelerates the objective function evaluations. The proposed methodology is 
demonstrated to a setting where field measurements are available. 

Key words. Bayesian experimental design. Expected information gain. Stochastic optimization. Polynomial 
Chaos, Two-phase transport 
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1. Introduction. Remediation of a polluted subsurface presents an increasingly common 
need in most urban areas undergoing accelerated expansions. Risks associated with these 
pollutants range from health consequences to financial costs of both monitoring, remediation 
and insurance. The fact that the subsurface can never be completely characterized is a key 
contributor to these risks, and credible procedures for improving this characterization have 
far reaching consequences across the spectrum of constituency. What is ultimately needed 
is a sufficient assessment of the subsurface which depends on the physics of subsurface flow 
and a knowledge of the initial conditions, and an assessment of the flow functionals that are 
relevant to risk assessment. As a first step, in the present paper, we address the issue of 
optimal subsurface characterization under conditions of limited resources. 

A vast amount of research works have been dedicated to this challenge and have offered 
application-specific answers throughout the years with a Bayesian decision analysis framework 
often being present. Among the earliest ones, the works by [7, 17], discussed the worth of 
correlated hydraulic conductivity measurements that was evaluated depending on the revenue 
due to the resulting uncertainty reduction and the cost of obtaining them. In the excellent work 
by [18], a sequential sampling methodology was developed and the issues of where to obtain 
the next sample and when to cease sampling were addressed using an analogous worth-of- 
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information framework. More recently [11], a Kullback-Liebler formalism that maximizes the 
worth of information of an experiment was developed, and the worth of additional observations 
characterized. While that work ignored the physics of flow in porous media by developing a 
kriging model for the measured concentrations in the subsurface, it does lay the foundations 
for the present work. A common characteristic in all the above works and many that followed, 
is that the data-worth analyses, Bayesian in nature, are divided in three main phases; the 
prior, the preposterior and the posterior analyses with the first and the last being the well 
known steps of any Bayesian method and the second being the phase where the decisions 
about the design parameters are to be made. This typically involves the suggestion of a 
utility function, expressed often in monetary terms that quantifies the worth of collecting 
some specific data and then evaluating the average worth by taking the expectation of this 
function. The above mentioned works and numerous others [24, 26, 3] verified a general 
conclusion: that the location or any other design parameters characterizing the best samples 
may depend on the model uncertainty but also on the decision to be made with the latter 
factor usually adding an economic flavor in the data worth approach. Although in most 
real-world problems concerning the geosciences community, this might be an efficient way to 
address such issues, it is of critical importance, from a mathematical perspective, to be able 
to distinguish the role of model uncertainty in optimal designs and not much attention has 
been received on this direction. 

It is the purpose of this paper to present an approach to the optimal design problems 
arising in contaminant transport applications that meets each problem’s objectives from an 
information theoretic perspective, that is to explore the use of other design criteria as can¬ 
didates in the preposterior phase of any data-worth framework, that provide maximum in¬ 
formation about model uncertainty without being concerned about economic costs, but still 
providing low risk results for decision making. An extensive statistical literature is available 
on experimental design methodologies for linear regression models summarized in [2, 6] with 
criteria stated as functionals of the Fisher information matrix. Bayesian counterparts of these 
criteria as well as additional design criteria for nonlinear models have also been suggested (see 
[4] for a review). Typically these are taken as the expectation of some appropriately chosen 
utility functions or approximations of them. 

This is the direction followed in this work. We choose a design criterion for data collection 
procedure according to an objective function that maximizes information gain about the 
uncertain parameters. As a natural consequence of dealing with realistic, complex and often 
high-dimensional non-linear models, computationally intensive techniques such as the use of 
model surrogates and Monte Carlo methods are involved in order to evaluate and optimize 
the design criterion. The Bayesian parameter inference results that are obtained from such 
optimal designs can then provide a rational basis for decision making. All the above mentioned 
steps can be summarized as shown in the flowchart in Fig. 2 . The methodology developed in 
this paper is demonstrated on an actual site shown in Fig. 1. Soil-type data was collected at 
this site at various depth at boreholes located along the solid lines that criss-cross the site, as 
shown in the soil boring map in Fig. 1. Concentration values were collected at the black dots 
throughout the site. With reference to this specific case study, the sequence of action that we 
investigate consists of the following steps: 

1. Initial spills are assumed to have occurred at the location of the storage tanks shown 
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Figure 1: Aerial photo showing the location of storage tanks (left) and soil boring map (right). 


in Figure 1. 

2. Mean and variances of permeability values are associated to soil types throughout the 
site, resulting in lognormal initial probabilistic models of permeabilities. 

3. Physics of flow through porous media yield a corresponding distribution for the con¬ 
centration of pollutants throughout the site. 

4. For a given monitoring design consisting of a set of specific five spatial locations where 
concentrations are to be observed, a distribution of these observables can be obtained 
from the previous step. 

5. From this distribution of observables, the likelihood for each possible value of ob¬ 
servable is calculated, and the Bayesian posterior distribution for that observable is 
evaluated. Note that at this step, no concentration data is assumed to have been 
collected yet. 

6. The Kullback-Leibler (KL) distance between the prior and the posterior distributions 
is evaluated as a measure of information gain. 

7. The KL distance is integrated over all possible values attained by the observables at 
the fixed design locations. 

8. The optimal monitoring locations are determined by maximizing this averaged KL 
distance. 

9. Measurements of pollutants are taken from the actual field data at the optimal loca¬ 
tions. 

10. The Bayesian posterior distributions for permeabilities are calculated based on these 
observations. These posteriors provide an updated description of the permeabilities 
throughout the site and can be used to improve the prediction of concentration maps 
over time. 

While a stopping criterion for this Bayesian update has been introduced elsewhere [11], it 
is outside the scope of the present investigation. The paper addresses computational and 
algorithmic issues required for the implementation of the above steps and provides insight 
into the computed optimal monitoring strategies, and is organized as follows. In Section 2 
we introduce the design criterion used for experimental design that provides optimal Bayesian 





























4 


PANAGIOTIS TSILIFIS, ROGER G. GHANEM AND PARIS HAJALI 


PRIOR 

ANALYSIS 


Mok* 

OMumptieni 
boMd on 
ovoBobia 
InfefTTiqHon 


PREPOSTERIOR 

ANALYSIS 






Figure 2: A flowchart outlining the steps in prior, preposterior and posterior stages of the 
optimal design methodology. 


inference results and a lower bound that is used as an alternative in the optimization procedure. 
The stochastic approximation framework required for maximizing the objective function, is 
described in Section 3. Section 4 provides a toy example in order to justify the use of the lower 
bound estimate instead of the actual expected information gain as the design criterion. Then 
our methodology is applied on a real site in Section 5. Precisely, the model is described in 
subsections 5.1 and 5.2, then a surrogate is constructed in subsection 5.3 in order to accelerate 
simulations of computationally intensive procedures. The experimental design analysis is 
performed in subsection 5.4 and the results are validated in subsection 5.5 by estimating 
the Bayesian update of the uncertain parameters based on data that is generated both for 
a hypothetical scenario and from field measurements. Our conclusions are summarized in 
Section 6. 

2. Bayesian experimental design. 

2.1. Design criterion. Our main interest in the present work is to define and evaluate 
a particular experiment which, constrained by fixed resources, will reduce the prediction 
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uncertainty in some well-defined optimal sense. Let d denote what we refer to as the set of 
design parameters. For each choice of these parameters, the experiment design is fixed. Thus d 
could, for instance, consist of smoothing kernels that parameterize measurement instruments. 
Alternatively, as we do in the present work, we construe d S as a vector that dehnes 
the spatial coordinates in the horizontal plane of m points at which observations will be 
obtained. Clearly, d could also consist of nonlinear functionals. The numerical values of 
the observations will be denoted by y G M"*. Prior to conducting the experiments, y is 
a random vector. Following the experiment, the numerical values attained by this vector 
will be used to condition the inference. We make the restrictive assumption that optimal 
reduction of uncertainty in model-based prediction is attained through an optimal reduction 
of uncertainty in model parameters which we denote by 6. Relaxing this assumption, while 
numerically tedious, does not present conceptual difficulties. 

2.1.1. Expected information gain. We are thus interested in inferring the unknown pa¬ 
rameters 6 that govern the behavior of a physical process. We model these parameters as a 
vector of random variables. In a Bayesian setting, this inference is carried out by updating 
the prior distribution p{6) with a posterior one, namely p(0|y,d), given that some specihc 
data y was observed for particular design parameters d. The posterior distribution is updated 
according to the rule 

• 1 ) p{e\y,d) = 

where the multi-dimensional integral, p{y) = f p{y\0, d)p{9)d6, is referred to as the evidence. 
Using a Shannon information approach we can define the information gain after updating the 
distribution of 6 to be the Kullback-Leibler (KL) divergence [20] of the prior and the posterior 
distributions, that is 



( 2 . 2 ) 


Dkl [p(-|y,d)||p(-)] 


'0 


p(0|y,d)log 


p(^|y,d) 

, p{^) 


dO. 


We are interested in quantifying the information gain before the data is actually collected. In 

the same spirit we define the expected information gain, first proposed by [22] as 

(2.3) 


U{d)= [ DKL\p{-\y,d)\\p{-)]p{y\d)dy 

Jy 


p(0|y,d)log 


yje 


p(^|y,d) 

p{e) 


dO p{y\d)dy 


This can be interpreted as follows: The contribution of any possible output that is used 
as a set of data y to update to the posterior distribution is given as the KL divergence of 
the two distributions. Before the data is collected, C/(d) provides us with a measure of the 
average information to be gained. This is a function only of the design parameters d. It is 
therefore natural to assume that the choice of d* that offers on average the most informative 
observations and thus, is the optimal experimental design, is the one that maximizes 17(d) 
and so it satisfies 


d* = arg max 17(d). 
del? 


(2.4) 
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Evaluation of the above objective function and therefore its optimization is not a triv¬ 
ial task. At first, one can see that the posterior, being unknown beforehand, needs to be 
evaluated or approximated. In [23], the posterior was replaced by its Laplace approximation 
and ?7(d) was estimated with a sparse quadrature rule. In [31] an alternative form of the 
objective function was derived by using Bayes’ rule to express the posterior in terms of the 
evidence, the likelihood and the prior distributions and estimates were proposed via Monte 
Carlo methods. In both approaches, the maxima were identified by using an exhaustive grid 
search over the whole design space and limitations due to computational expense were re¬ 
ported. Evaluation of the design criterion on all points of the design space can easily become 
infeasible in applications where either higher dimensional design parameters are involved or 
an expensive forward solver is incorporated, hence the need for iterative search strategies is 
necessary to detect the optimal value. In [15, 16] it was demonstrated that stochastic ap¬ 
proximation methods [35] are well adapted to the present situation, when the Monte Carlo 
estimate of the objective is used and this is the approach we follow in this paper. However, 
instead of simply adapting the methodology developed in these works, we further derive a 
lower bound of the expected information gain and its corresponding Monte Carlo estimate 
to be maximized. The reason for doing so is to overcome computational obstacles that arise 
from our application: The direct Monte Carlo estimator entails a discretization of the double 
integral appearing in equation (2.3) and has been shown to have a bias that is inversely pro¬ 
portional to the number of samples in the inner sum of the estimate, requiring a very large 
number of inner loop samples for the bias to be negligible [31] (see also Appendix B in [15] 
for a numerical study). Furthermore, the variance of that estimator is controlled only by the 
number of the outer loop samples. Expressions for both the bias and variance are given in A. 
For applications such as the one included in this paper, using tens or hundreds of thousands 
samples can easily become prohibitive. On the contrary, the lower bound derived below can 
be easily seen to be unbiased and its variance is controlled by the product of the number of 
samples used in both loops, something that allows us to achieve the same level of accuracy in 
our estimate with a much lower number of samples. 

2.1.2. Derivation of a lower bound. By substituting p(0|y, d) from eq. ??, we write 


U{d) 



p{d\y,d) 

. Pi^) 


p{e,y\d)dedy 


y 


e 


log [p{y\6, d)] p{y\e\d)p{e)dedy 



log b(y)]p(y|d)dy 


and by denoting with 7i[q{u)] = — f log [(/(c*^)] q{u)du the entropy of a distribution g(a^) we 
take 


U{d) = - [ n\p{y\e,d)]p{0)d0 + n[piy\d)]. 

Je 

In what follows we take the likelihood as a Gaussian distribution with density 


p{y\e, d) = ( 27 r)-™/ 2 | 5 ]|-V 2 exp |-i [y - G{e, d)]^ [y - 0(0, d)] | . 
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where the mean is the model output Q{0,d) and the covariance matrix is X). This is a 
common choice for models where the observables y are defined as the model prediction plus 
some measurement errors 

y = 0(6, d) + e 

with the latter being normally distributed with density M{0, S). In general, the measurement 
error can be related to the model output through a proportionality factor or even a more 
complex relation which is typically incorporated in the covariance matrix S. This implies 
that X) can depend implicitly on both the uncertain and the design parameters. For our 
purposes we make the rather simple assumption that no such dependence is involved and 
X) has fixed entries. Further details on the exact values of the covariance matrix including 
independence among the measurement errors is provided later in our applications. 

For the above choice of the likelihood function the entropy can be calculated and is simply 

^[p(y|0,d)] = i{m + log[(27rnS|]} 

and we finally get 

(2.5) U{d) = -^{m + log [(27r)-|S|]} + n\p{y\d)]. 

At last, using Jensen’s inequality, we derive a lower bound for 17(d). Namely we have 

^[P(y|d)] = / -log[p(y|d)]p(y|d)dy >-log [ p‘^{y\d)dy 

Jy VJy \ 

and we dehne the lower bound of U (d) as 


( 2 . 6 ) 


Uiid) = --{m + log [(27r)”'|5]|]} - log 


i 


p\y\d)dy 


Note that since the common first term of 17(d) and 17L(d), as they appear in equations (2.5) 
and (2.6), are independent of d, one eventually needs only to maximize the second term, 
namely the entropy of the marginal distribution of the data or its lower bound. This idea 
is not new and has been previously used in linear regression models [32] and in geophysical 
applications [37]. 


2.1.3. Estimation of the lower bound. As mentioned above, the optimization problem 
is equivalent to minimizing 

Ulid) = [ p\y\d)dy. 

Jy 

After expanding the evidence function and writing 


U*L{d)= III p(y|0i,d)p(y|02,d)p(0i)p(02)d0] 
Jy Je Je 


d02dy 


ly 

we can see that a Monte Carlo estimate of the above is 

N,M 


um = 


1 


NM 


i5(y*l^^d) 


ki=i 


(2.7) 
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where { 0 *}, { 0 ^}, i = 1 ,... ,N,j = are i.i.d. samples from p{0) and {y*} are i.i.d. 

samples from p(y| 0 *, d). 

3. Stochastic optimization. 

3.1. Simultaneous perturbation stochastic approximation. Maximization of the objec¬ 
tive function derived in the above section can in general be a difficult task. In cases where 
the design parameters are of high dimension or an expensive forward solver is involved in the 
evaluation of 172(d), a direct grid search can easily become prohibitive. Since only a noisy 
estimate of the actual objective function to be maximized is available, we turn our attention 
to stochastic approximation methods. These are algorithms that approximate roots of noisy 
functions and can be used for optimization problems to find the roots of the gradient of the 
objective function to be optimized. This is done with an iterative procedure of the general 
form 


(3.1) dk+i = dk - akg{dk), k>0 

where {ak}k>o is a sequence of positive pre-specified deterministic constants, known also as the 
learning rate of the algorithm and 5 (d) = Vdf7(d) is the gradient of the objective function 
with respect to d. Algorithms that use an explicit expression for the gradient are called 
Rohhins-Monro algorithms [30] and those where a finite-difference scheme is employed, are 
called Kiefer- Wolfowitz algorithms [19]. For our purposes we use an algorithm from the Kiefer- 
Wolfowitz family, namely the Simultaneous Perturbation Stochastic Approximation method 
(SPSA), proposed by Spall [35, 36]. What makes this methods attractive versus others is the 
fact that only two forward evaluations are required for the gradient estimation. The updating 
step of the method is given by 

(3.2) d;j_l_]^ dk Okliki^k') j 


where 


(3.3) 


5fc(dfc) 


^2(^fc + CfcAfc) - Ufidk - Ck^k) 
2 cfc 


A 


-1 -I 

k,l 


A 


-1 

k,nd . 


{A + kplY' {k + lp 

and Ofc, Cfc, a, 7 are positive parameters whose values affect the convergence rate and the 
gradient estimate. General instructions on how to choose the appropriate values for each 
problem are given in [36]. The vectors are random vectors with coefficients Ak^i drawn 
independently from any zero-mean probability distribution such that the expectation of [A^^] 
exists [35]. A common choice, which we used in our analysis, corresponds to Ak^i = 2{Z — 1/2) 
where Z ~ Bernoulli(p) with success probability p = \- 

4. Example: Nonlinear model. 
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4.1. The model and experimental scenarios. The main goal in this example is to explore 
the accuracy of the lower bound estimate as a substitute for the direct maximization of the 
expected information gain [15] and provide a numerical comparison of the two approaches. 
This is achieved by evaluating the two objective functions for a simple algebraic model which 
is inexpensive to evaluate and is nonlinear with respect to both the uncertain and design 
parameters. Consider thus the model where the observable quantity y is dependent on k and 
d as 

(4.1) y{K,d) = G{K,d) + e 

(4.2) = K^d? + + e 

where G{K,d) is the model output and the noise is taken to be e ~ AA(0,10“"^). For prior we 
choose initially k ~ ZY(0,1) and later on we discuss a few more cases. Suppose we have control 
over d £ T> where P = [0,1] is the design space and we are interested in inferring k. We 
explore two cases, hrst the case where inference is carried out using a single observation of y 
and second the case where two observations of y can be obtained, corresponding to different 
values of d. One can think of the design parameter d as the location where y is observed. 
Before observing y, we would like to know the value of d that would make our observations 
the most informative ones. Both cases of this example have been studied in [15] using the 
direct Monte Carlo estimate of U{d). The hrst case was also studied in [23] using the Laplace 
approximation of the posterior distribution of k and then performing the integration with 
sparse quadratures. 

4.2. Results. We are using the expected information gain lower bound estimate as given 
in (2.7) as our criterion to determine the optimal design d for inferring k. For comparison, 
we also reproduce the results of [15] using the expected information gain estimate which from 
now on we call double loop Monte Carlo (dlMC) estimate. The exact expression of the dlMC 
estimate is given in A. Our computations are performed using an ensemble of samples with 
N = M = lO'^ for all different priors that we consider below, while keeping the error variance 
hxed. We demonstrate that the two estimate share the same maxima and their graphs have 
good quantitative agreement with the lower bound providing slightly less noisy values. 

4.2.1. Single observation. Fig. 3 shows the values of estimates of U^{d) for the design 

of one experiment after using a 101-point uniform partition on the design space P = [0,1] 

for N = M = 10^ and for N = M = 10^ as well as the estimates of dlMC. All cases show 

the existence of two local maxima at d = 0.2 and d = 1. As expected, our estimate always 

takes values slightly smaller than dlMC, as the former is a lower bound of the latter. Due to 

the fact that the first term in our estimate (the Gaussian entropy) is a constant, while the 

first term of the dlMC estimate is a Monte-Carlo approximation of the very same constant, 

we observe that the lower bound is a smoother curve for the same values of N, M. Following 

the slope analysis of [15] we also present the results for two different choices of prior, namely 

1 /“? 

when K ~ P(0, Ke) and when n U{Ke, 1) where Ke = [(1 — e °-®)/2.88] ' which is the point 
where the slope of G{k, 1) becomes greater than the slope of G(k, 0.2). For the former case 
we obtain a global maximum at d = 0.2 whereas for the latter, the maximum is at d = 1. At 
some particular points we can observe that although our estimate is only a lower bound of the 
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Figure 3: Expected information gain lower bounds and dlMC with N = 10“^, M = 
estimated over T> = [0,1] for three different priors. 


dlMC estimate, the value of the dlMC might fall below the value of U1{d). This, in addition 
to the fact that we compare noisy estimates of a deterministic quantity, can also emerge as 
the result of the bias of dlMC which is not negligible for our choice of M (number of inner 
loop samples). For a more detailed discussion on the bias of dlMC for this example see [15] 
and for the analytic form of the bias see [31]. 

4.2.2. Two observations. We calculate the values of Ul{di,d 2 ) for N = M = 10^ over 
V X 2? = [0,1]^ and we compare them with those of dlMC in Fig. 4. Similar qualitative results 
as in case 1 can be observed. For k ~ 2/(0,1), the points where maximum is attained are 
{di,d 2 ) = (0.2,1) and {di,d 2 ) = (1,0.2) which are combinations of the local maxima in the 
1-dimensional case. That means that, if two observations can be afforded, they should be 
taken at the points where local maxima exist for the one observation scenario. The order is 
insignificant since, as we see, the Ul{di,d 2 ) surface is symmetric with respect to the di = c /2 
line. Similar conclusions are drawn for k ~ 2/(0, k^) and k ~ U{Ke, 1) where the maxima are 
at {di,d 2 ) = (0.2,0.2) and (c/i,d 2 ) = (Ij 1) respectively, as already expected from the results 
of case 1. 

5. Example: Case study. This second example demonstrates the application of the for¬ 
malism to a subsurface pollution characterization problem for an actual site where field data 
for permeability and concentrations are available. 
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(a) Lower bound estimate UL{d) 




Figure 4: Expected information gain lower bound estimate and dlMC estimate with N = 10^, 
M = lO'^ for the 2-dimensional design, estimated over V = [0,1]^ for different priors. 


5.1. Description. We are interested in performing Bayesian inference on the uncertain 
parameters that affect the transport of pollutants in a contaminated area. Such a procedure 
will decrease the uncertainty regarding the extent and location of the plume and will further 
affect the cost and other important aspects of soil remediation. When limited resources are 
available for experimentation and data collection, it is of great importance to develop exper¬ 
imental design strategies that will enhance the quality of data. Our current study concerns 
a contaminated site located in Santa Fe Springs, California. Previous data consisting of soil 
types is available from boreholes located along four cross sections across the site. Each bore¬ 
hole reaches a depth of 20m. Accordingly, soil type is defined as a categorical variable with six 
possible states. The field and the locations of the boreholes are shown in Figs. 1 and 5 (left). 
The available soil data is used to construct a domain that can be used in our forward model to 
simulate the transport of pollutants. The construction of a domain that can be regarded as a 
good approximation of the real situation, is achieved by the stratigraphic modeling capabilities 
of the Groundwater Modeling System (GMS) software [1]. More specihcally, using a standard 
inverse-distance weighting based interpolation scheme, we assign soil types for all locations in 
the area of our interest. The stratigraphy of the resulting domain as well as the extent of each 
soil are shown in Figs. 5 (right) and 6. Next, the domain was discretized to a 40 x 60 x 10 
grid where each cell has dimensions 15.4m. x 6.7m. x 2m. and carries the information of the 
soil type present in that location. The grid is used as our finite volume discretization in the 
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subsequent TOUGH2 simulations and each of the six different soil types present in our domain 
are assigned different permeability values which, in our study, are considered to be random 
and are the only source of uncertainty. 



Figure 5: Boreholes (left) and final domain stratigraphy (right) with the z-axis being stretched 
for the sake of illustration. 


5.2. Simulating flow and transport using TOUGH2. 

5.2.1. EOS7r module. We employed the multiphase simulator TOUGH2 [28] and its 
massively parallel version TOUGH2-MP [40] with the EOS7r module to simulate groundwater 
flow and contaminant transport. TOUGH2 provides us with a finite volume solver which 
discretizes the mass and energy equations over space and time and updates a set of primary 
variables that consist the solution of the governing equations by estimating the increments 
at each time step with a Newton-Raphson method. The common mathematical form of 
the equations for multiphase fluid flow include several thermophysical parameters such as 
density, viscosity, enthalpy etc. which are determined by the various ”EOS” (equation-of- 
state) modules. The EOS7r module [27] used here, is mainly intended to provide radionuclide 
transport capability but can be as well used for transport of volatile and water soluble organic 
chemicals (VOCs). Ghange in concentrations is caused in general for three reasons: transport, 
decay and adsorption. Volatilization of the VOCs in all phases is modeled by Henry’s law 
and occurs by advection and diffusion. Decay is modeled by a first-order decay law. In 
case of radionuclide transport, this is interpreted as radioactive decay but in the case of 
organic contaminants it can be explained as biodegradation. Adsorption is dependent on the 
distribution coefficient that characterizes each rocktype. EOS7r in total can simulate transport 
of five components in a two-phase flow, namely water, air, brine, a parent contaminant and a 
daughter contaminant in aqueous and gaseous phases. 

5.2.2. Santa Fe site. Eor our problem we are interested in simulating the spreading 
of organic contaminants in a field located in Santa Fe Springs, CA which is approximately 
600m. X 400m. and only 20m. deep. This is discretized in TOUGH2 to a grid consisting of 
40 X 60 X 10 = 24000 active cells where each cell has dimension 15.4m. x 6.7m. x 2m., as 
explained in the previous paragraph and is assigned a rocktype according to its soil type. We 
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(d) Silty sand (e) Sand with 10% silt (f) Poorly graded sand 


Figure 6: Soil morphology of the domain. The z-axis has been stretched for the sake of 
illustration. 


focus on investigating the propagation of uncertainty emerging from the unknown permeabil¬ 
ities K = (ki,...,K6 ) of the six different soil types the are present on our site, through the 
transport process of the VOCs. The porosity is taken to be uniform (j) = 0.35 all over the 
domain. Although we are interested in the transport mainly of petroleum hydrocarbons whose 
weathering processes are in general known to include adsorption and biodegradation effects, 
for the purposes of this study we will consider these effects to be of negligible importance 
by assigning the distribution coefficients for adsorption to be zero and the half-life parameter 
~ 10^*^ so that we have no decay effects. Thus our model focuses only on the volatilization 
properties of the VOCs. We choose the values for the molecular weight and inverse Henry’s 
constant parameters to correspond to those used for describing transport of petroleum hydro¬ 
carbons and specifically those that are mostly detected using gas chromatography techniques, 
named Gasoline Range Organics (GRO). Typically GRO are the subsets of hydrocarbons in 
the G5-G12 range and include hydrocarbons such as Benzene, Toluene, Ethylbenzene, m-, o- 
and p-Xylenes, Naphthalene and Acenaphthene. Their molecular weights are in the 78 — 154 
grams range. In our case we arbitrarily set the molecular weight to be 112.4 grams which is 
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Figure 7: Locations where initial contaminant injections are placed. 


a value close to the average GRO hydrocarbon. 


Table 1: Material and initial parameters used in our simulations. 


Parameter 

Symbol 

Value 

Porosity 

</> 

0.35 

Tortuosity factor 

To 

0.1 

Relative permeability parameters: 

X 

0.457 

- 

Sir 

0.15 

- 

Sis 

1 

- 

Sgr 

0.1 

Gapillary pressure parameters: 

A 

0.457 

- 

Sir 

0 

- 

I/Pq = a/Prug 

5.105e-4 

- 

P 

^ max 

10^ 

- 

Sis 

1 

Diffusivities (all k): gas phase 

d^a 

10-6 

aqueous phase 

4 

10-10 

Molecular weight 

- 

112.4 

Inverse Herny’s constant 

- 

2.1 • 10-® 

Half-life parameter 

- 

1050 

Initial pressure 

P{0) 

1.013 • 10^ 

Initial gas saturation 

Sg 

0.75 

Temperature (constant) 

T 

20° 


For our forward evaluations of the model with TOUGH2 we consider that only 3 com¬ 
ponents are present by assigning the brine and daughter radionuclide mass fractions to be 
zero. We assume isothermal conditions with the temperature being constant at 20°G and 
no-flux boundary conditions. The mathematical formulation of the flow and transport model 
is described in detail in Appendix B and the values of the model parameters that are assumed 
known are given in Table 1. Since we want the pressure and gas saturation to be close to 
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steady-state conditions by the time the contaminants are released to the ground, we first run 
the model for a time period until T = 13.3 years, where the initial mass fractions for the 
contaminants is zero. The outputs of pressure and gas saturation are subsequently used as 
initial conditions for the transport simulation. We rerun the model after assigning initial 
aqueous phase solubilities for the VOC. We denote these initial conditions with yo. Their 
locations at time t = 0 are taken to be approximately in the areas were the storage tanks 
are located as can be seen in Fig. 7. For our purposes we add 164 inactive (zero volume) 
cells to the surface layer of our grid and initial injections are assigned. The exact values were 
chosen to be yo = 0.1 -|- ppb (parts per billion) where U are numbers randomly generated 
from a Uniform distribution with support on (5 • 10“®, 10“^) and are considered known. The 
transport simulation runs for the same time period as the flow simulation. 

5.3. Developing a surrogate model. Due to the complexity of our model, only a single 
simulation of the transport flow with TOUGH2 requires several minutes to finish. Thus, 
implementing our experimental design framework, including the minimization of the expected 
information gain lower bound with SPSA which requires thousands of objective function 
evaluations, becomes impractical. It is therefore necessary to create a surrogate model that 
would provide us with forward evaluations of the model that are significantly cheaper to obtain 
than running TOUGH2. 

5.3.1. The prior and input transformation. The unknown physical parameters of our 
problem are the permeabilities k = (ki, ...,kq) of the six materials making-up the subsurface 
at the site of interest. We choose all k* to be independent with a lognormal prior distribution, 
that is 

(5.1) p(logKi) =W(-23.5,4) 

The choice of the mean /r = —23.5 and variance cj^ = 4 are made such that our prior covers 
a range for permeability values in the order of magnitude 10“®cm^ to 10~^^cm^ which corre¬ 
sponds to semi-pervious materials. We find this a rather general prior that corresponds solely 
to our knowledge that the materials present in our domain are silt, clay, silty sand, clayey 
sand, sand with 10% silt and poorly graded sand. 

Now if we let F{x) = P{Ki < x) to be the cumulative distribution function of Kj, then we 
have that := F{Ki) ~ f7(0,1) and for ^ = (^i, ...,^6) we define G{$) = 0(F“^(^),d) where 
F(^) = n^=i to be our model output where the input is a set of 6 independent standard 
uniform random variables. 

5.3.2. Polynomial chaos expansion. We make use of the property that our random out¬ 
put Q{$) is a physical process with finite variance, therefore it is a square-integrable random 
field £ L^(M”*), dehned on the probability space ([0,1]®, T", P) and admits a polynomial 
chaos representation of the form [8, 34, 38] 

(5.2) g(^)= 

cx,\ot\<oo 

where a = (ai, ...,ad) and ai G N™ for i = l,...,n is a multi-index with modulus |q:| = 
ai + ... + Un-, each is a vector in M™, ^ is a second-order random variable defined on 
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([0,1]®, P) with values in M”* and the functions form a complete set of orthonormal 
functions that satisfy 

(5.3) E[^Q,(^)'h^(^)] = (5 c/ 3 = 5ai/3i X • • • X 

Typically the random variable ^ has independent components that follow a Gaussian, Uniform, 
Gamma or Beta distribution. The basis function then is chosen to consist of multidimensional 
polynomials = V’oi(Ci) x • • • x ’ipa^{an), where 'ijjai is respectively Hermite, Legendre, 

Laguerre or Jacobi polynomials of order Oj. According to the input transformation in the 
previous paragraph, we take the components of $, to be ZY(0,1) distributed and therefore the 
polynomials in the expansion will be Legendre. For computational purposes we work with a 
truncated version of the expression (5.2) by writing 

(5.4) Gr{^)= Y. Pc.4/«(^). 
where the number of terms in the above expansions is 

(5.5) iV„ = |{„eN",0<M<r)|=^L±^^. 

Eq. (5.4) provides an accurate approximation of the true model output y as long as the 
coefficients are estimated in a fashion that they also incorporate a transformation of ^ to 
the uncertain parameters of the problem of interest. 

5.3.3. Coefficient estimation. In order for the expression (5.4) to be of use, we need 
to estimate the coefficients p^. This in general can be done with various methods, mainly 
categorized as intrusive [39] and nonintrusive methods. We use non-intrusive methods which 
are easier to implement and more convenient when the forward simulation is seen as a black 
box. Popular nonintrusive methods include approximating the coefficients by orthogonal 
projection of the output on the basis functions [9], which involves calculating multidimensional 
integrals. Other methods calculate the coefficients by solving a linear system of equations 
constructed after evaluating the model on a set of samples and then either interpolating 
these points (by choosing collocation points of the polynomial roots [21]) or minimizing the 
least squares error [41]. The last method is the one adapted here, namely we estimate the 
coefficients by linear regression taking advantage of the linear dependence of Gr(^) on p^. 
This requires the evaluation of G($) at Ng points and then for each component yc, 

c = 1, ..., 24000 of the output vector G = (yi, ■■■, y 24000 we solve the system 

(5.6) yc = 4'Pc, 

where yc = [y \,..., yc^Y' ■, ik is the Ng x N-p matrix formed by evaluating the polynomial basis 
functions at the Ng selected points and Pc are the vector with the unknown coefficients. The 
least squares solution of (5.6) is the one that minimizes ||'kpc — ydl^ and is given by 

Pc = (T^^)”i4'^yc 


(5.7) 
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provided that exists. This requires that Ng > N-p so that our system will be 

overdetermined. Again, due to the computationally demanding TOUGH2 forward simulator 
it is impractical to obtain tens or hundreds of thousand samples so we work with an ensemble 
of Ng = 1000 samples which, even on the massively parallel version of TOUGH2 (TOUGH2- 
MP), with a moderate number of processors used, takes around one week to be generated. This 
choice of Ng allows us to calculate the coefficients of an expansion of order up to r = 5. The 
points were randomly selected using Latin Hypercube sampling. In fact, the 5-order 

polynomial chaos expansion included 462 coefficients. This is close to Ng ~ 2Np which has 
been suggested as a good choice [14]. New results concerning the stability and L^-convergence 
of Polynomial Chaos approximations obtained via least-squares solutions have been recently 
reported by [5, 12], however such an analysis falls beyond the scope of this work. 

5.3.4. Goodness of fit and truncation error. The next step after obtaining the expansion 
coefficients is to test how well Qr performs as a surrogate. For validation purposes, we estimate 
the coefficients for r = 1,2,3,4 and 5 using all 1000 samples. We want to test how well the 
least squares solution fits the samples but also how close Qr is to Q in the sense. 

For the first test we employ the common statistic known also as coefficient of deter¬ 
mination [33] and provides a goodness-of-fit measure for our linear model. The statistic is 
defined as 


(5.8) 


RSSr 




where the residuals sum of squares is RSSc = ^f=i{yc — , c = 1,..., 24000. 

For the second test, we define the expected relative truncation error as 


(5.9) 


E[\Q,{^) - 

lE[|ac(OP] 


/[0,i]n l^c(^) -yc{^Tpi^)d$ 


c = 1,..., 24000 


where p{^) = 1 is the joint distribution of independent U{0, l)’s. The above can be estimated 
as 


(5.10) 




E1T|5c({')-!;o(?)P 


c= 1,..., 24000. 


Fig. 8 shows the boxplots of the R^ statistic, obtained from all components Q, of the 
expansions of order r = 1,...5 (left) and the boxplots of the relative errors for each order 
r = 1,..., 5 and for all the components of Q (right). Regarding the R^ statistic and particularly 
for r = 5 the median is 0.986 and the lower quartile is above 0.96 giving us a good fit for the 
75% of the expansions along the domain. A thorough look showed that the remaining outputs 
including outliers for which the fit is not stable correspond to points of the domain that are 
in the bottom layer and along the east boundary. Regarding the expected truncation error 
and specifically for the expansions of order 5 the median is 3.1 • 10“^ and the upper quantile 
is 10“^ giving us good approximations in the sense. Again the truncation error increases 
for the points in the bottom layers. To ensure that our experimental design methodology 
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implemented in the next paragraph is unaffected by possible instabilities of the Polynomial 
Chaos expansions, the model outputs produced for the bottom layer of our domain were not 
involved in our study. 


statistic 



Polynomial order r 



Polynomial order r 


Figure 8: Boxplots of the statistic (left) and relative errors (right) of the polynomial 
chaoses for all components of the output of order r = 1,5. 


5.4. Experimental design. 

5.4.1. Design settings. As mentioned above, our main goal is to perform Bayesian infer¬ 
ence on the permeability parameters that will provide us a tighter posterior that will decrease 
the uncertainty about the true values of the soil permeability. This will further enhance our 
current knowledge about the plume location and extent that can potentially result in devel¬ 
oping cost-efficient remediation strategies. In order to achieve such a goal, we are seeking the 
best locations from where data should be collected by employing the expected information 
gain lower bound as our design criterion. 

Unlike the analysis of the nonlinear algebraic model it is clearly understood that in this 
case, design of a single experiment, that is design for the collection of a single datum would not 
have important effects in the inference procedure and it is never performed in practice. In our 
setting we consider that for any location in the (x, ?/)-plane we can observe the concentrations 
y from the hrst 5 layers of our domain, that is up to 10m. depth, so 5 data points are available 
from each location. Without any loss of generality in our method we choose to find the design 
of experiments consisted of the 5 best locations in the (x, y)-plane from where data will be 
collected simultaneously to be used for inference. This is a moderate choice which appears 
to be satisfactory in order to validate our method. Further evaluation of the optimal number 
of data points is beyond the scope of the present study. The design parameters therefore are 
d = {di^x,di^y,d2,x,d2,y,...,d5,x,d^,y) where {di^x,di^y) e Pq = [0,600] x [0,400], i = 1,...,5 
and the design space for our problem is P = Pq. 

The observations y are subject to additional measurement noise as indicated from our 
Gaussian likelihood function involved in the expected information gain lower 

bound derivation. Here we have substituted our forward solver ^(k, d) with the polynomial 
chaos surrogate G{^)- The covariance matrix has been chosen to be S = fj^l 25 with = 10 
This is a rather simple choice that guarantees that measurement errors are independent. A 
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Figure 9: Scatter plots of the optimal points di for all optimization solutions d* of the SPSA 
algorithm. The red transparent areas denote the locations of the sources, plotted for compar¬ 
ison of the patterns. 


more sophisticated choice would be to take the standard deviation to be proportional to the 
observable quantity a = aGc{^,d) where a is some proportionality factor. We avoid this 
choice as it implies the dependence of the error on the design parameters which contradicts 
our assumptions for the lower bound derivation. 
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5.4.2. Results. We run the SPSA algorithm in order to find d that minimizes [/£, there¬ 
fore maximizes the lower bound estimate. To assess the performance of the algorithm, we run 
several cases where the choices of N and M in the Monte Carlo loop vary. It is easy to see 
that the variance of our estimator is 


(5.11) 


var 


um 


j^var [p(y|K,d)] 


which shows that N and M have the same contribution in controlling the variance, therefore we 
take them equal. As stated previously, our estimator is unbiased, unlike the dlMC estimator, 
so an extremely large value for M is not necessary as it is for dlMC to maintain a low bias. 
We considered three different cases where N = M = 10,100 and 1000. Since the results of 
SPSA are noisy, in order to better evaluate the performance of the algorithm we obtain the 
results of 100 independent runs for each case. For the first two cases we run the algorithm 
for a total of 10^ evaluations of the objective function and for the last case we run for 5 • 10^ 
evaluations. It appears that these choices are rather high since the iterates are stabilized much 
earlier maintaining a low iteration error. 

The resulting 5-tuples of points in Dq are shown all together in a total of 500 points 
in Fig. 9, for all three cases. This gives us a qualitative idea about where the expected 
information gain is maximized and so data should be collected from. The formation of a 
certain pattern as N, M are increased is obvious and the optimal points appear to be very 
close to where the contaminant sources were placed. At the same time various points that 
appear to be outliers are also present, a characteristic behavior of the SPSA algorithm. The 
case N = M = 10 particularly shows all points widely spread all over Dq and one can only 
distinguish the two areas around (x, y) = (150, 250) and around (x, y) = (450,100) where 
more points are accumulated. The cases N = M = 100 and N = M = 1000 provide very 
similar conclusions with the majority of the runs having converged at very specific locations, 
close to the actual contaminant sources, forming a clear pattern with the main difference that 
in the third case the points are even more concentrated and less outliers are present. 

For a more quantitative argument we also present the exact values of U^{d) as well as 
those of 


(5.12) 


Udd) 


{m + log [(27r)™|S|]} - log 


1 

NM 


N,M 




calculated at all outcomes d* of our runs. These values quantify the information gain that 
each design is expected to offer and at the end, if we had to choose only one design, this should 
be the design that provides the largest value. This time, for diagnostic reasons and in order 
to provide a measure of comparison, we want to have a fixed accuracy for the estimates and 
we use the same number of samples N = M = 1000 for the evaluation of the two objective 
functions on all points. Their values are all shown in the histograms presented in Fig. 10. 
Again there is a significant difference between the quality of the results of the first case with 
that of the last two cases for both functions. For 172(d), the first case gives us designs that 
whose values vary in a range from 0.1 • 10^ to 6 • lO^'^ (observe that the x-axis of the histogram 
is an order of magnitude larger than the rest) with a large number of them being away from 
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(a) N=M=10 



(b) N=M=10 



Final (d) value 


(C)A^=A/=100 (di)N = M=lOQ 




(e)N = M=mO (f) iV=M = 1000 




Figure 10: Histograms of the Ul{d) (left) and ^^(d) (right) values for all optimal points d* 


the minimum value. The impression one might get for this case is that chances are few that 
the optimization will yield an optimal design solution and not just an outlier. The other 
two cases provide similar results with the third being, as expected, better than the second, 
in the sense that the optimal points are even more concentrated around the minimum value. 
For completeness we present also UL{d) which is technically nothing else but the negative 
logarithm of 172(d), shifted by a constant. In the first case, the values cover a range from 
—3 to 2.5 with a mean around 0. The last two cases give us again similar results with both 
histograms covering values from around 1 to 3.5, a much narrower interval than the first case. 
Again we can see that the third case is slightly better than the second where we see that the 
unimodal structure of the histogram is more concentrated on higher values close to 3 which 
implies that a few more runs eventually converged to the global maximum. 
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Table 2: The (x, ?/)-coordinates of the 5-point designs do and and their values. 



di 

^2 

ds 

di 

ds 

UUd) 

do 

do 

(127.96,259.65) 

(22.95,171.37) 

(442.59,97.05) 

(75.36,300.00) 

(143.22,304.41) 
(318.58,169.27) 

(337.11,279.02) 
(425.06,109.96) 

(412.26,135.86) 

(441.00,253.70) 

3.698 

0.286 


5.5. Bayesian update of log(«;). In order to validate our methodology and demonstrate 
the significance of the experimental design analysis developed above, we perform Bayesian 
inference based on two different designs that are selected according to the results of the 
previous section. More precisely, data from two different sets of 5 points, denoted with do and 
do, is collected and used to update the probability distribution of k. Specifically, do is selected 
among the 100 optimal designs that were approximated by SPSA in the previous section, for 
N = M = 1000 and is the one that achieved the maximum Uiid) value, so based on the 
previous analysis it is the design that is expected to yield the best posterior. The second 
design do, chosen for comparison only, was arbitrarily selected among the optimal designs 
that were returned by SPSA using the poor estimate with N = M = 10 and has a much lower 
UL{d) value, therefore it is expected to update to a wider posterior. The exact location of the 
points for each set and their [/^(d) values are shown in Table 2. 

Since the posterior distributions cannot be calculated explicitly, we use Markov Chain 
Monte Carlo (MCMC) methods to draw samples from them. MCMC methods rely on gener¬ 
ating a sequence of random variables based on a Markov Chain that converges to a stationary 
distribution 7r(y), called the target distribution and therefore the sequence generated after a 
burn-in period, can be thought of as following the stationary distribution tt. To avoid strong 
autocorrelation among the samples, appropriate thinning and tuning of the algorithm is re¬ 
quired. A powerful MCMC method is the Metropolis-Bastings (MB) algorithm that was first 
proposed by Metropolis [25] and later generalized by Bastings [13]. The MB algorithm at an 
arbitrary step generates a new sample y from a proposal distribution q{x,y) given that x is 
the last accepted sample and accepts it with probability 


(5.13) 


a{x, y) = min 


'^{y)q{x,y) ] 

’ 'n'{x)q{y,x) j ■ 


In our case we use the adaptive version of the MB algorithm as developed in [10] with 
a Gaussian as the proposal distribution where its covariance matrix is updated at each step 
taking into account all the previous samples that have been drawn. This implies that the 
chain is non-Markovian, however it has been proved that it has the correct ergodic properties. 
The adaptive MB (aMB) algorithm provides faster convergence to the target distribution and 
achieves lower autocorrelation among the samples without very large thinning. 

Below we explore two cases where in the first data is observed from a reference held that 
is associated with one specihc realization of the model output while in the second, data is 
observed from a reference held that is constructed using Gaussian process (GP) regression on 
measurements taken from the real site. 


5.5.1. Case 1: Reference data generated from prior. Bere we assume a hypothetical 
situation where the real permeability values are those displayed on Table 3. Our model’s 
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Figure 11: Concentration map for the 5 top layers of the domain used as the reference field 
in example 1. The ’o’ and ’o’ signs indicate the locations of do and do designs respectively, 
from where measurements were taken. 


Table 3: True values of k and log(K:) used for generating the data in case 1. 


i 

log Ki 

Ki (cm^) 

1 

-22.195 

2.295 • lO-^'J 

2 

-20.791 

9.346 • 10"^° 

3 

-21.567 

4.300 • 10-1° 

4 

-25.193 

1.145 • 10-11 

5 

-25.272 

1.058 • 10-11 

6 

-24.610 

2.050 • 10-1° 


output for those values can be directly evaluated and the 5 top layers are displayed in Fig. 11. 
With this reference field assumed to be our ’’reality”, we collect data from the locations 
indicated at designs do and do and use them for our computations. 

We generate 50000 samples with a 10000-sample burn-in period and we retain a sample 
every 5 steps. After 8000 samples are obtained, histograms of their values are shown in 
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Fig. 12. Note that the histograms display the marginal distribution of each log/tj, i = 1, ...,6 
which are no longer independent themselves, however comparison of the histograms for each 
case together with their priors and the true value gives us an idea about the different results 
obtained for each design. As expected, we observe that do provides narrower posteriors that 
do. This can be observed particularly on logK 2 and log/ts and in a smaller scale on logKi, 
logKs and logK 4 . The posteriors of log/te for both designs do not display any significant 
discrepancy from the prior but even in this case the one corresponding to do appear to be 
slightly decentralized towards the true value. In addition, for do three posteriors, namely 
log K 2 , log ^5 and log kq retain their gaussian bell-shaped form and it seems that almost no new 
information has been gained about their values. This is actually a consequence of the fact that 
the corresponding soils (silty sand, sand with 10% silt and clay) are not present in the locations 
where the data was collected from, for this design. Note that clay is present in the lower layers 
of our domain and not much information is gained about it from do either. Another general 
characteristic is that, even for the soils for which both designs provide similar posteriors, the 
maximum a posteriori values of do show excellent agreement with the true values while those 
of do show a slight discrepancy which makes them inappropriate for estimation. Overall, we 
conclude that the design do provides significantly better inference results that those provided 
by do and this conclusion can be generalized for the comparison of do with any other design 
that achieves a lower C/i(d) value. 



Figure 12: Comparison of the histograms of the marginal posteriors of all log(Kj), i = 1 ,..., 6 for 
the designs do (green) and do (red). The red curve indicates their common prior distribution 
and the dashed black line indicates the true value used to generate the data. 


5.5.2. Case 2: Reference data from field measurements. In this case, the data used 
in the likelihood function which is incorporated in the acceptance probability through Bayes’ 
rule, is taken from a reference concentration field, that was generated by GP regression based 
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Figure 13: Concentration map for the 5 top layers of the domain obtained by GP regression on 
real data, used as our reference field in example 2. The ’o’ and ’o’ signs indicate the locations 
of do and do designs respectively, from where measurements were taken. 


on measurements taken from the field data at the specified locations as shown in the soil 
boring map in Fig. 1. These measurements, denoted with f, consist of 1240 points and their 
locations form a 1240 x 3 matrix X. Then, if f* denotes the concentrations over the discretized 
domain, that is f* £ with corresponding locations X*, then 

(5.14) 

f*|X*, X, f ~ AA (X(X*, X)X(X, X)-^f, X(X,, X*) - X(X,, X)X(X, X)-^K(X, X*)) , 

where X(X*,X) is the covariance matrix with (X(X*,X))jj = k(xj,Xj), x* G X*, xj G X 
and k(-, •) denotes some appropriately chosen kernel (for more details on the derivation of this 
conditional distribution one can see [29]). In our case we chose a squared exponential kernel, 
given as 

(5.15) k(x, x') = (T^exp 

which was fit to the data with variance cr^ = 5 and correlation lengths (^i, -^ 2 ) -^s) = (30, 30,10). 
Our reference field, consists of the mean f = X(X*, X)X(X, X)~^f and the values of the 5 


2 ^ 
n=l 


(Xn - <)" 




































26 


PANAGIOTIS TSILIFIS, ROGER G. GHANEM AND PARIS HAJALI 



Value Value Value 

Figure 14: Comparison of the histograms of the marginal posteriors of all log(Kj), i = 1,6 for 
the designs do (green) and (red). The red curve indicates their common prior distribution. 


top layers are shown in Fig. 13. 

Again, we set the posteriors with p(K:|y, do) and p(«:|y,do) as our target distributions 
and this time we generate 100000 samples with a 10000-sample burn-in period and we re¬ 
tain a sample every 5 steps. After 18000 samples are obtained, histograms of the marginal 
distributions of each logAtj, i = 1,...,6, are shown in Fig. 14. Again, we observe that do 
provides narrower posteriors than do for most of the cases. This can be observed particularly 
on logKi, logK 4 and log^s. The posterior of log Kg for the optimal design is almost identical 
to its prior which implies that nothing new has been learned about its true value. As in the 
previous example, this is due to the fact that the corresponding soil (clay) is not present in 
the locations where the data was collected, for this design (recall that clay is present in the 
lower layers of our domain). In this case, since our data is obtained from a procedure other 
than evaluating our forward model, it cannot be seen as a direct output of it and generally 
we do not expect to observe uniqueness of the input parameters that can potentially give rise 
to our measurements. This is reflected by the fact that different designs give posteriors that 
seem to concentrate around different values. Bayesian inference however, still provides us with 
informative results as we observe the posterior distributions to be much further away from 
the priors, thus learning something about how close our model and our prior assumptions are 
to the actual reality. 

6. Conclusions. We presented an experimental design framework that focuses on provid¬ 
ing optimal solutions to the design problem in terms of maximizing the information on model 
parameters from Bayesian inference. This was achieved by employing an information theoretic 
criterion, namely the expected relative entropy between the prior and the posterior distribu- 
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tions of the unknown parameters. An additional reformulation of the design criterion through 
the derivation of a lower bound, was of great importance in order to address biasedness issues 
as well as alleviate the computational burden associated with optimization. 

The framework was applied to a real-world problem: The problem of permeability identifi¬ 
cation in contaminated soils in the presence of experimental field data. The forward model in 
this case consisted of a system of differential equations that describe contaminant transport in 
porous media, namely a two-component, two-phase flow-and-transport model that provides as 
its output the concentrations of the pollutants at the time when data is collected. The design 
parameters consist of multiple locations where the concentrations will be measured. In the 
present setting, with the analogous adopted code (TOUGH2) used as a black box simulator, 
no derivative information was available to the optimization process. The SPSA algorithm, 
in addition to addressing this issue also seemed to accelerate convergence to the optimal so¬ 
lutions. To further reduce the computational cost, construction of a model surrogate was of 
crucial importance since the implementation of such a procedure would be impractical oth¬ 
erwise. Finally, our methodology was validated after the inference results showed that the 
data collected at the optimal design locations were much more informative than the data ob¬ 
tained from other arbitrarily chosen points, in the sense that they resulted in much narrower 
posteriors, thus gaining more knowledge about the true situation in the subsurface. We ob¬ 
serve from our analysis that, under limited resources, the performance of a Bayesian update 
depends significantly on the location of data acquisition. This highlights the need for optimal 
monitoring layouts in order to manage environmental risks under economic constraints. 

Appendix A. Comparison of the expected information gain estimates. 

A.l. The double-loop Monte-Carlo estimate. The expected information gain can be 
rearranged in the form 


/(d) = 


T JIC 


{log d)] - log [p(y|d)]} p(y|K, d)p{K)dKdy 


and can be evaluated by using Monte Carlo methods with 


1 ^ 


i=l 


M 




i=i 


where {k*}, i = j = 1,...,M are i.i.d. samples from p{k) and {y®} are i.i.d. 

samples from p(y|K®, d). 

A.2. Properties of the estimates. The properties of /(d) are explored in [31]. Precisely 
it is shown that the variance is proportional to 
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and the bias is proportional to 
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where ^(d), -B(d) and C'(d) are terms that depend on the likelihood and evidence function. 
Clearly, as mentioned above, N controls the variance and M controls the bias. 

For 17£(d) we have 
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var [p(y|0,d)] 


and the bias is trivially zero, whereas for Ui^d) we have 
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where the second line follows after Ist-order Taylor expansion about U’l{d) and 
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_ 1 uar [p(y|d,d)] 

~ NM C/£(d)2 

where the third line follows after a 2nd-order Taylor expansion. Here the variance and the 
bias of the lower bound are controlled by both N and M. 


Appendix B. Governing equations for flow and transport in a porous medium. 

B.l. Balance equations. The general mass- and energy balance equations in a multicom¬ 
ponent (NK components) multiphase problem are given as 


^ = V-F" + /, k = l,...,NK, 

where k is labeling the mass component up to a total oi NK, M represents mass or energy per 
volume, F represents mass or heat flux and q denotes sinks or sources. In some formulations, 
such as in TOUGH2, an integral expression is used in the form 

M'^dVn= [ F'^-ndTn+ [ q’^dVn, k = l,...,NK 
at JVn JTn JVn 

where Vn is an arbitrary subdomain of the flow system and T^ its closed boundary surface, 
n is a vector normal on the surface element dVn pointing inward into Vn- If TOUGH2 is 
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used for the numerical solution of the governing equations, module EOS7r provides consistent 
characterization of the constitutive equations, allowing a value of NK as high as 5 (water, 
air, brine, parent radionuclide, daughter radionuclide). In out problem, a zero source term 
is assumed resulting in g = 0 and the contaminant sources in the model are taken to be 
localized on the surface, at pre-assigned spatial locations shown in Fig. 7. In a TOUGH2 
formalism, these sources are modeled as injections in attached zero-volume blocks connected 
to the gridblocks where the source is assumed to be. 

The mass accumulation condition for the kth component is 

/9 

where the summation is taken over the fluid phases (liquid, gas, non-aqueous phase liquids). 
Only liquid and gas phase are considered in our case. The porosity is denoted by 0, S’y? is the 
saturation of phase /3 (takes values from 0 to 1), is the density of phase /3 and is the 
mass fraction of component k present in phase 13. 

The advective mass flux is given as 

F k \ ^ vk-v^ 

adv — / 

/9 

and individual phase fluxed are given by a multiphase version of Darcy’s law: 

F/3 = PpUp = - Pi3g). 

Here is the Darcy velocity (volume flux) in phase (3, k is the absolute permeability, Hrp is 
relative permeability to phase /?, pp is viscosity and 


Pp = P + Pcp 


is the fluid pressure in phase (3, which is the sum of the pressure P of a reference phase (usually 
taken to be the gas phase) and the capillary pressure Pep (< 0), g is the vector of gravitational 
acceleration. Vapor pressure lowering due to capillary and phase adsorption effects is modeled 
by Kelvin’s equation 

P,{T,Si) = fvPL{T,Si)Psat{T) 


where 


fvPL =exp 


MMSi) 

P/P(T-F 273.15) 


is the vapor pressure lowering factor. Phase adsorption is neglected. The saturated vapor 
pressure of bulk aqueous phase is denoted by Pgat, Pd is the difference between aqueous and 
gas phase pressures, is the molecular weight of water and R is the universal gas constant. 
Temperature T is assumed constant. 

The absolute permeability of the gas phase increases at low pressures according to the 
relation given by Klinkenberg 

Kj — 
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where Kqo is the permeability at ’’infinite” pressure and b is the Klinkenberg parameter. In 
addition to Darcy’s flow, mass flux occurs by molecular diffusion according to 

Here is the molecular diffusion coefficient for component k in phase /3, ToTp is the tortuosity 
which includes a porous medium dependent factor tq and a coefficient that depends on phase 
saturation Sp, Tp = Tp{Sp). Finally, the total mass flux is finally given by 

0 


B.2. Relative permeability model. The relative permeability is assumed to follow the 
van Genuchten-Mualem model which for liquid is 


and for gas is 


hlrl — 


1 


1 1, 



1 2 


Si < Sis 

Si > Sis 


Krg 


1 l^rli '^f Sgr — 0 

( 1 - 5 )' ( 1 - 52 ) , ifSgr >0 


subject to restriction 0 < Hrh i^rg < 1- Here S* = {Si — Sir)/{Sis — Sir) and S = {Si — Sir){l — 
Sir - Sgr). 


B.3. Capillary pressure model. For the capillary pressure we use the van Genuchten 
function given as 

Pcap = -Po{s*-^/^-iy~^ 

subject to restrictions —Fmax < Pcap < 0. Again S* is defined as for the relative permeability. 


B.4. Parameters. Table 1 displays the nominal values assigned to all parameters required 
according to the above governing equations. It is important also to note the following for our 
simulation: 

1. The mobilities are upstream weighted. 

2. The permeabilities are harmonic weighted. 

3. Module EOS7r currently neglects brine and daughter radionuclide, resulting in a 3- 
component flow. 

4. Adsorption effects are neglected. 

5. Biodegradation effects are neglected. 
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